iPSCs derived from infertile men carrying complex genetic abnormalities can generate primordial germ-like cells

Despite increasing insight into the genetics of infertility, the developmental disease processes remain unclear due to the lack of adequate experimental models. The advent of induced pluripotent stem cell (iPSC) technology has provided a unique tool for in vitro disease modeling enabling major advances in our understanding of developmental disease processes. We report the full characterization of complex genetic abnormalities in two infertile patients with either azoospermia or XX male syndrome and we identify genes of potential interest implicated in their infertility. Using the erythroblasts of both patients, we generated primed iPSCs and converted them into a naive-like pluripotent state. Naive-iPSCs were then differentiated into primordial germ-like cells (PGC-LCs). The expression of early PGC marker genes SOX17, CD-38, NANOS3, c-KIT, TFAP2C, and D2-40, confirmed progression towards the early germline stage. Our results demonstrate that iPSCs from two infertile patients with significant genetic abnormalities are capable of efficient production of PGCs. Such in vitro model of infertility will certainly help identifying causative factors leading to early germ cells development failure and provide a valuable tool to explore novel therapeutic strategies.

www.nature.com/scientificreports/ Infertility is a major public health issue, affecting 10-15% of couples in reproductive age worldwide. In half of the cases, infertility is due to a male factor 1,2 with varying degrees of severity. The most extreme form is non-obstructive azoospermia (NOA), which is characterized by the complete absence of sperm in the ejaculate. Genetic causes have been attributed to microdeletions in the azoospermia factor (AZF) region 3 , chromosomal abnormalities, single gene mutations, or multifactorial inheritance 4 . Disorders of sex development (DSD) are rare congenital conditions characterized by a complete or partial mismatch between genetic and phenotypic sex, and represent another cause of infertility 5 . One of these DSD is the XX male syndrome, a rare congenital intersex condition in individuals with male phenotype expression and variable clinical presentations, ranging from ambiguous to normal male genitalia and infertility 6,7 . The fate of the testes is determined by the sex-determining region Y (SRY) gene in the Y chromosome. Approximately 80% of 46,XX DSD patients are SRY-positive, with most cases resulting from SRY gene translocation onto the pseudo-autosomal region PAR1 of one X chromosome 8 . In contrast, the genetic causes of most cases of 46,XX SRY-negative DSD patients remain unknown.
Research on gametogenesis responds to the actual need for reproductive health. Development of biological models to comprehensively reproduce the early stages of germ cell differentiation would enable a better understanding of the molecular mechanisms regulating normal spermatogenesis and those underlying infertility. In mammals, primordial germ cells (PGCs) are the precursors of germline lineage that give rise to gametes 9,10 . In mice, signals from the proximal extra-embryonic ectoderm (BMP4 and BMP8b) and the visceral endoderm (BMP2) play an essential role in inducting PGCs 10 . In particular, BMP4, which is produced by extra-embryonic mesoderm, controls the survival and localization of PGCs 11 . PGC specification is characterized by multiple key events; conversion of 5mC into 5-hydroxymethylcytosine (5hmC) by Ten-eleven translocation proteins would constitute the first step of the DNA demethylation pathway 12 . However, little is known about PGC formation in humans due to poor accessibility.
In the stem cell research field, induced pluripotent stem cells (iPSCs) provide an opportunity to reproduce in vitro pathological processes and open new avenues to understand the pathological states at a molecular level. Many groups have attempted to study normal human germ cell differentiation using human embryonic stem cells (hESCs) or hiPSCs with varying degrees of success, eventually resulting in breakthroughs in the field [13][14][15][16][17][18][19][20][21] . However, the generation of PGCs in the context of infertility has been poorly addressed. The most impressive work reported so far was performed on iPSCs derived from infertile male mice with trisomy XXY or XYY. In the aforementioned study, trisomic cells were demonstrated to lose their extra chromosome through the reprogramming process, with generated sperm giving rise to healthy and fertile offspring 22 . To date, very few studies have explored hiPSCs in human male infertility and only one recent study reported the generation of germ cell-like cells from azoospermic men carrying microdeletions in the AZF region 23 . Generating patient-specific iPSCs with the same genetic background as infertile donors provide unprecedented human models for studying genetic disorders of infertility; and assessing their potential to differentiate into PGC-LCs is an essential step in understanding infertility.
The purpose of this study was to determine whether infertile men could produce PGCs within the framework of genetic disorders. Genetic abnormalities of two patients were analyzed using conventional cytogenetic and molecular techniques as well as whole genome sequencing. Several genes potentially implicated in the infertility and/or DSD of the patients were identified. Subsequently, patient erythroblasts were reprogrammed into primed iPSCs to obtain the unique cellular model that enables assessment of the ability of each patient's genetic program to produce PGCs. Primed iPSCs were first converted into a naive-like state before inducing their differentiation into germinal cells. Our results show that the iPSCs of both patients give rise to PGC-like cells (PGC-LCs), indicating that infertile men with distinct genetic abnormalities are capable of robust production of PGCs. This study also demonstrates that iPSC technology provides a powerful biological tool to identify early events of embryonic development that may be responsible for infertility and could be used to explore innovative therapeutic strategies. Further specific investigations are needed to determine whether these PGC-LCs can form functional and fertile gametes, but this remains a challenging task, especially for human PGCs, due to the many ethical considerations that are raised in this field.

Results
Conventional cytogenetic and molecular analysis. Patient 1 had non-obstructive azoospermia (see Methods for the complete clinical description) and carried a complex chromosomal rearrangement (CCR) involving chromosomes 7 and 12 24 . Additional results using whole genome sequencing enabled the identification of a total of seven breakpoints and one deletion located at the site of the 7p21.3 breakpoint (Fig. 1a). These breakpoints were associated with three events: insertion of a segment of the long arm of chromosome 12 (12q12q23.2) into the short arm of chromosome 7 (7p21.3), a pericentric inversion of chromosome 12 involving the 12p13.31q23.2 segment, and a paracentric inversion of chromosome 12 involving the 12q23.2 segment (Fig. 1a,b). The deletion identified within the 7p21.3 breakpoint does not involve genes of interest. In the 12q23.2 region, four breakpoints were identified, with two occurring within the UTP20 and ARL1 genes; however, these two genes are not related to reproductive function (Fig. 1a). Chromosome 12 breakpoint mapping enabled the identification of a candidate gene, SYCP3, encoding an element of the synaptonemal complex that is essential for chromosomal pairing during Prophase I of meiosis, located approximatively 300 kb from the chr12:101,835,719 breakpoint (Fig. 1a). In addition, 1 M array-CGH, as well as a custom-designed CGH-array were performed; however, significant DNA copy number variations that may account for non-obstructive azoospermia were not identified.
Patient 2 is a male with DSD, including ambiguous genitalia (see Methods for the complete clinical description). Conventional analysis revealed a karyotype 46,XX (Fig. 1c). Fluorescence in situ hybridization (FISH) performed on lymphocytes and buccal swab using probes of chromosome 18, X and Y confirmed the absence www.nature.com/scientificreports/ of chromosome Y material (Fig. 1d). FISH and PCR analysis confirmed the absence of SRY gene in patient's genome ( Fig. 1e and data not shown, respectively). The 1 M array-CGH and the custom-designed array-CGH, also failed to identify SRY sequences, as well as significant DNA copy number variation that may account for the DSD phenotype. Next, whole genome sequencing (WGS) analysis was performed for patient 2, revealing three rare variants of potential interest, which comprised: three single-nucleotide polymorphisms (SNPs) ( Table 1). The first SNP identified is a "probably damaging" heterozygous missense variant (Polyphen) in exon 3 of the AMH gene (c.482G>A) and is associated with persistent Müllerian duct syndrome. This disorder affects normally virilized males who develop female sex organs (uterus and fallopian tubes) due to persistence of Mullerian duct derivatives during foetus development (#OMIM261550). This syndrome is caused by heterozygous mutation inactivating the genes encoding for AMH or AMH receptor (AMHR2) (#OMIM261550). The second SNP is a heterozygous missense mutation (c.1030G>A) in exon 12 of the NUP107 gene. Mutations in this gene have been reported to cause XX gonadal dysgenesis (#OMIM618078). We also identified a "possibly damaging" heterozygous missense mutation (c.691G >A) in STAT5B, which is known to act in many biological processes, including growth hormone insensitivity (#OMIM618985). Thus, WGS highlights rare genetic variants that could assist in deciphering the complex phenotypic traits of the DSD patient. (d) FISH analysis on buccal swab using centromeric probe for chromosome 18 (blue, 2 signals), chromosomes X (green, 2 signals) and chromosome Y (red, no signal) (patient 2). (e) FISH analysis on buccal swab using centromeric probe for chromosome X (blue, 2 signals), chromosome Y heterochromatic region (green, no signal) and SRY gene locus on chromosome Y (red, no signal) (patient 2). www.nature.com/scientificreports/ Primed iPSCs generation and characterization. Primed iPSCs from DSD patient (patient 2) were generated from erythroblasts following the protocol previously described for patient 1 24 . We investigated the pluripotency of five patient 2 primed iPSCs clones, by analyzing the expression of the endogenous pluripotency marker genes SOX2, OCT4, NANOG by RT-PCR. The pluripotency markers were strongly expressed in all five iPSC clones but not in the primary cells (PBMCs) of the patient (Fig. 2a). Immunofluorescence revealed specific staining for SSEA4, TRA-1-60, and OCT3/4 pluripotency markers in iPSC colonies (Fig. 2b). Therefore, the   www.nature.com/scientificreports/ iPSC clones derived from the infertile DSD patient expressed conventional pluripotency markers, indicating successfully reprogramming.
To ascertain pluripotency, we explored the teratoma-forming potential for two iPSC clones: patient 2 cl.15 and cl.20. Patient 1 cl.12 and cl.32 were described previously 24 . Histological analysis of the tumors produced revealed differentiation of the cells into endodermal, ectodermal and mesodermal tissues, with a histologic mix of glandular gut-like epithelium, neural tissue and large cartilaginous areas (Fig. 2c). The tissues were welldifferentiated in all the structures observed and without malignancy.
As a consequence, both in vitro testing for pluripotency markers and in vivo testing for teratoma formation confirmed the pluripotent state and efficient reprogramming of the iPSC clones analyzed.
To assess iPSCs genomic stability, karyotypes of both iPSCs lines were performed. Results showed that the complex chromosomal rearrangement involving chromosome 7 and 12 of patient 1 was still present as well as the 46,XX karyotype of patient 2 ( Supplementary Fig. S1). And mainly, it demonstrates that iPSCs reprogramming process did not induce additional chromosomal abnormalities. In order to investigate the possible CNV instability during somatic reprogramming, we performed pangenomic 1 M array-CGH. CNV analysis did not identify significant change of copy number variation.

Conversion of primed hiPSCs into a naive state of pluripotency. Pluripotent stem cells exist in
a naive or primed state, respectively represented by pre-implantation epiblast stem cells and developmentally more advanced post-implantation epiblast stem cells (EpiSCs). Primed state represents a more advanced "differentiated" state of pluripotency than the naive state and show lower differentiation efficiency 21,25,26 . For this www.nature.com/scientificreports/ reason, we focused on converting primed hiPSCs into a naive state as described in Fig. 3a. The protocol for naive iPSC conversion was performed as reported by Gafni et al., 2013 27 . Three clones of patient 1 hiPSCs and two clones of patient 2 hiPSCs were subjected to conversion by providing exogenous stimulation with bFGF, LIF, TGFβ, and the small molecule inhibitor of GSK3, MEK, MAPK, and JNK, termed 4i medium, for the induction and maintenance of human naive stem cell pluripotency. After 2 weeks of culture and maintenance through single-cell dissociation passages, naive state conversion generated cells exhibiting naive-like iPSC characteristics with the morphology changing from flat to domed. This specific phenotype of the colonies and their ability to tolerate single-cell dissociation is a strong indication of successful conversion of primed iPSCs to a naive state but we also assessed the expression of KLF4 and TFCP2L1 genes known to be expressed in the naive state 28 . RT-PCR results showed a weak increase of KLF4 which is already highly expressed in primed iPSCs, but a significant increase of TFCP2L1 expression in patient naive iPSCs compared to the primed cells, confirming the successful conversion of the patient's iPSCs from primed to naive state ( Supplementary Fig. S2).

Embryoid body formation and PGC-LCs differentiation.
To assess the germline differentiation potential of the patient's naive hiPSCs, embryoid body (EB) aggregation in 96-well low-attachment plates was performed in the presence of BMP2/4, LIF, SCF, and EGF to induce hPGC-LC differentiation (Fig. 3a). BMP2/4 have been described to promote PGC differentiation from PSCs in EB cultures 10,11 . When transferred into PGC medium, cells within EBs differentiated into PGC-LCs within 4 days. Next, we assessed the presence of germ cells in the EBs by RT-qPCR and immunohistochemistry. RT-qPCR were performed to assess the expression of NANOG pluripotency gene, NANOS3 and SOX17 early PGC markers on undifferentiated naive iPSCs and EBs at day-4 of differentiation from a control, patient 1 and patient 2. The results showed statistically significant decrease of NANOG in the EBs of patient 1, as well as the increased expression of the early PGC markers NANOS3 and SOX17, the critical specifier of human PGC-LCs as compared to the naïve iPSC (Fig. 3b,c). The results obtained from control and patient 2 EBs were not statistically significant but show similar tendency than patient 1 samples ( Supplementary Fig. S3).
We also investigated the expression of OCT4 pluripotency gene, TFAP2C, D2-40, CD38, and KIT early PGC markers genes as well as DAZL and DDX4 late PGC markers genes by RT-qPCR; the results are presented in Supplementary Fig. S3.
We next evaluated epigenetic reprogramming status of PGC-LCs by assessing proportion and distribution of DNA 5hmC in patient's EBs cells by double immunostaining. Immunofluorescence assay show increased staining levels of 5hmC in PGCs compared to those of neighboring somatic D2-40cells in patient 1 and 2 EBs (Fig. 4c). Some surrounding D2-40cells show 5hmC staining, this could be explained by the fact that these cells are in a differentiation process towards germ cells lineage at day 4. The presence of 5hmC signals in PGC-LCs is consistent with the differentiation process toward germline lineage. Expression analysis of the candidate genes in patients 1 and 2. In order to determine the potential involvement of the candidate genes identified by WGS in the phenotypes of patients 1 and 2, we evaluated their expression by RT-qPCR in the embryoid bodies of patient's and control. Our results did not show a major alteration of SYCP3 gene expression compared to control at least at PGC-LCs stage (Fig. 5a). We also investigated the expression status of the three candidate genes identified in patient 2 and found non-significant increase of NUP107 and STAT5B compared to control. We observed a more marked upward trend in AMH gene expression in patient 2 EBs compared with the control (Fig. 5b). We compared AMH expression in primed iPSCs of patient 2 to that of two naive control iPSCs (46,XX and 46,XY); the results showed an intermediate level of AMH expression in patient 2 iPSCs compared to the AMH levels in 46,XX and 46,XY iPSCs controls (Supplementary Fig. S4).

Discussion
To decipher the etiology of male infertility, we performed a full genomic characterization of genetic abnormalities of two cases of infertile men and identified potential loci and/or genes linked to infertility or sexual development. Through iPSC derived models, we indirectly showed that infertile men harboring complex genetic abnormalities could produce PGC-LCs. This result is novel and significant to the field, as it enables further investigation of the blockage and development process of gamete differentiation. Patient 1 is a CCR carrier, which could alone explain the azoospermia, as the structural abnormalities might disrupt chromosome pairing at the pachytene stage during meiosis. Indeed, checkpoint mechanisms occur during the meiotic process. Failure of chromosome synapsis generally leads to spermatogenic arrest at the mid-pachytene stage of meiotic prophase or at metaphase stage of the first meiotic division, resulting in subfertility or infertility. Failure to detect such abnormalities in chromosome pairing can lead to aneuploidy, one of the major causes of embryonic loss. However, the testicular biopsy of patient 1 revealed the presence of few spermatocytes and extremely rare spermatids, attesting that the first and even the second division of meiosis occur despite the CCR, www.nature.com/scientificreports/ albeit at a low level. In patient 1, a breakpoint was identified at a distance of 300 kb from the SYCP3 gene 3' end, but it did not occur in a known regulatory domain of the gene. Nevertheless, the breakpoint may have disrupted the regulatory elements of the gene and, therefore, its expression. However, our RT-qPCR results showed no major alteration of SYCP3 expression in the patient 1 EBs. We cannot exclude that this is due to the fact that PGC-LC do not yet robustly express meiotic genes. This factor is a major player for germ cell differentiation as it enables the achievement of key processes, such as recombination and meiotic chromosomes segregation during gametes production; SYCP3 belongs to the synaptonemal complex that binds homologous chromosomes together at pachytene (synapsis), but was also found expressed in ESCs and iPSCs cells 33 . As heterozygous mutations in the SYCP3 gene have been reported to be associated with azoospermia 34,35 , a mis-regulation of SYCP3 expression in patient 1 could be involved in his spermatogenic failure. However, to determine whether the SNP identified www.nature.com/scientificreports/ in the SYCP3 gene had an impact on the patient's 1 infertility, further experiments are required including differentiation towards meiotic or more mature germ cells and forced expression of SYCP3 using specific lentiviral vector for instance.
The clinical picture of the 46,XX DSD patient and level of testosterone are consistent with testicular tissue development: either testis development or the co-existence of both testicular and ovarian tissue in the same gonad (ovotestis). In any event, the patient's testicular tissue produced, in utero, insufficient testosterone for the full masculinization of the genitalia, leading to the ambiguous external genitalia and impaired development of Wolffian duct derivatives.
WGS, as well as the array-CGH, enabled confirmation of the 46,XX karyotype without SRY sequences in the entire genome of the patient. Although the WGS approach succeeded in identifying several genes potentially involved in the patient's phenotypic traits, the causative gene or the molecular mechanisms responsible for the patient's sexual reversion were not highlighted. While some DSD cases can be explained by a single causal variant, many other cases remain unexplained, potentially resulting from different mutations, which, in combination, have tilted the balance towards male differentiation.
Three rare variants of potential interest that could be linked to the patient's phenotype were identified by WGS, namely STAT5B, NUP107 and AMH genes. However, no significant change in STAT5B gene expression was observed in patient's EBs compared to the control EBs. STAT5B is involved in many biological processes such as severe growth hormone insensitivity, resulting in short stature (OMIM#245590) 36 . Patient 2 harbors a rare missense mutation within exon 7 of the STAT5B gene. The patient's height was 1.55 m, suggesting that the STAT5B mutation might be involved in this specific phenotype.
The expression of AMH and to a lesser extent NUP107 is slightly increased in DSD EBs compared to control EBs 46,XX, but without statistical significance. The NUP107 variant (exon 12) is present at a low frequency in the gnomAD database (2.122 10 -5 ) and represents a potential variant of interest. A rare homozygous missense variant in exon 12 of the NUP107 gene (c.1063C>T, p.R355C) has been reported in two sisters affected with primary amenorrhea and hypergonadotropic hypogonadism 37 . However, the results do not support a disruption of NUP107 gene expression at least at PGCs level.
AMH plays a critical role in the normal sexual differentiation and can be useful in the initial evaluation of DSD suspicion in childhood. But AMH expression in the diagnosis of DSD at adult age has not been to date investigated. However, we know that in male, the onset of puberty lead to the downregulation of AMH expression due to the increased intratesticular production of androgens and its receptor. After puberty, in female, the level of expression of AMH remain relatively stable until the third decade of life. We assessed the expression of AMH gene in primed iPSCs from patient 2 and two control iPSCS (46,XX and 46,XY). The level of AMH expression in the DSD cells is intermediate between male and female control iPSCs, suggesting that AMH expression could have contributed to the abnormal Müllerian duct development in this 46,XX patient in whom female internal reproductive organs were reduced to a vaginal cavity segment. Nevertheless, the AMH variant alone is not sufficient to explain sexual reversion. AMH is expressed by Sertoli cells and is involved in the regression of the Müllerian duct, which would otherwise develop into uterus, fallopian tubes and upper vagina. Thus, in the context of our patient DSD 46,XX, the abnormal increase in AMH expression could explain the presence of residual Müllerian tissue and, more importantly, is consistent with the presence of testicular tissue in his gonad. www.nature.com/scientificreports/ Specific phenotypes, such as DSD syndrome, are more likely the result of multiple genetic factors, however those leading to atypical sex development are often not identified due to insufficient knowledge of the pathogenesis and underlying mechanisms. Thus, it is conceivable to consider that additional mutations have contributed to the DSD phenotype of patient 1.
To model reproductive pathologies from these complex genetic landscapes we first assessed to differentiate PGC-LCs from primed iPSCs, but all our attempts failed (data not shown). Since the naive pluripotent state overcomes several hurdles encountered by primed pluripotent state, such as differentiation capability, single-cell passaging, and low gene editing efficiency 38 , we derived, for the first time in our knowledge, naive iPSCs from these two patients. So far, two strategies based on the forced expression of specific genes and/or the addition of specific factors in the medium, were used to induce male germ cells in vitro. Previous studies have demonstrated the capacity of normal hiPSCs to differentiate into PGCs 17,19,20 , and into gonocytes 22 , SSCs 39 , and meiotic germ cells 18,19,[39][40][41] . To date, PGC-LCs or mature germ cells within the context of karyotype abnormalities have not been generated. In our study, naive-iPSCs were successfully differentiated into PGC-LCs through EB formation in the presence of the essential BMP2/4, LIF, EGF, and SCF factors. Indeed, we revealed the high expression of TFAP2C and D2-40 early PGC markers. Remarkably, SOX17, the human-specific key regulator of PGC-LC specification, was shown to be significantly upregulated within EBs 21,42,43 , indicating that our culture conditions were effective in inducing hPGC-LCs from both patient's naive iPSCs.
Even if the proportion of positive germ cells within EBs were variable, the PGC fate was achieved by all differentiation assays. Furthermore, hPGC-LCs of up to 30% were generated, which are comparable to the study by Irie and colleagues who obtained approximatively 46% of hPGC-LCs in a normal genetic context 21 . We could also show an increase of DNA 5hmC staining in the PGC-LCs through germ cells differentiation process from infertile patient's iPSCs, confirming the successful derivation of PGC-LCs from those patients. These results indicate that the complete loss of fertility of the two patients may not be attributed to PGC formation abnormalities. Furthermore, human PGC-LCs were demonstrated to be produced from both XX and XY cells in male infertility with meiotic arrest and gonadal sex reversal.
Regarding patient 1, since SYCP3 gene is known to be highly expressed during meiosis, further differentiation of iPSCs-derived PGCs through meiosis is necessary to determine whether there is an abnormal SYCP3 expression beyond the PGC stage, and, therefore, its potential involvement in altered meiosis. Driving germinal differentiation further may provide an answer for the underlying cause of meiotic arrest and a better understanding of CCR pairing and checkpoint mechanism behavior during meiosis. In mice, functional gametes leading to healthy and fertile mice can be generated from iPSC derived PGC-LCs. However, the use of human PGC-LCs for xenografts in animal models or fecundation and progeny development to investigate whether these cells could give rise to functional gametes is not realistically feasible due to legal, philosophical, societal, and ethically evident challenges, even in the context of infertility.
For patient 2, we assume that the SRY-negative 46,XX PGC-LCs generated from his own iPSCs would most likely be unable to undergo meiosis, as the Y chromosome plays a major role in male fertility. Indeed, deletions of specific regions of the Y chromosome were linked to early failure of spermatogenesis and, consequently, to infertility [44][45][46] . For example, microdeletions of the AZF regions in Yq11 represent one of the most frequent genetic causes of azoospermia or severe oligospermia 47 . Therefore, it remains to be evaluated whether the PGC-LCs of patient 2 could undergo meiosis to differentiate into more mature male germ cells if AZF sequences are added. In view of this, an interesting question is raised, as it may be more conceivable to produce oocytes from the XX male's PGCs. This alternative would raise evident ethical concerns and question the acceptability of such an alternative for the patient. Hence, this highlights the absolute need for reflection and debate generated by such a new technology. Such an option also requires better knowledge of gene expression and epigenetic modifications in germ cells, which have been previously poorly studied because of the lack of an appropriate model, and can now be investigated as a result of the generation of PGCs from infertile men in the present study.
In conclusion, our results indicate that naive iPSCs derived from two infertile patients characterized by important genetic abnormalities are capable of efficient production of PGC-LCs. However, it remains to be elucidated whether SYCP3 (patient 1), AMH and NUP107 (patient 2) are involved in the absence of mature gametes in the patients. Nevertheless, these results are extremely encouraging and reinforce the need to further decipher the molecular mechanisms responsible for the lack of gametes. Moreover, we believe that the naive iPSC model introduced in this study would be useful to explore new therapeutic strategies, such as high-throughput screening of drug molecules, to overcome the blockage of gamete differentiation.

Materials and methods
Patients and controls. The patient 1 was 38 years old and consulted for infertility after he and his partner had been trying to conceive for 2 years. The patient was the first child of unrelated parents, and he had four brothers and five sisters whose fertility status could not be determined because of their personal situations (they were younger and not actively trying to procreate). Clinical examination excluded obstruction of the genital tract but revealed marked bilateral atrophy of the gonads, with a testicular volume of only 7 cm 3 (normal range: 20-25 cm 3 ). Semen was collected after a requested five-day period of abstinence. Two spermograms performed eight months apart revealed azoospermia. Testicular doppler ultrasound and ultrasound scans of the deep genital tract showed hypervascularization of the prostate, with calcification of the central zone, slight differential thickening of the pelvic walls, but without obstruction, and non-retentive vesicles. Laboratory tests revealed hormonal dysregulation, with a low serum concentration of inhibin B (36 pg/mL; normal range: 80-270 pg/mL) and a high serum concentration of FSH (15.5 IU/L; normal range: 1.4-10 IU/L). The serum concentration of LH was normal (7.9 IU/L; normal range: 1.4-8 IU/L). Bilateral testicular biopsy was performed, and histological analysis showed maturation arrest in all the seminiferous tubules mostly at the spermatocyte stage and more www.nature.com/scientificreports/ rarely at the spermatid stage. Thus, meiosis was initiated but not completed. In addition, testicular Leydig's cell hyperplasia was observed. The patient 2 was 44 years old and belongs to a consanguineous family as his parents were first cousins. Compared to other members of his family the patient has a smaller size. At birth, he presented an uro-genital malformation associated with persistence of a large vaginal cavity (4-5 cm deep) implanted in the posterior urethra below the sphincter, requiring many chirurgical interventions. At the age of 21, the examination showed a small size with a male-oriented phenotype, a small rod (6 cm long), two atrophic gonads (left and right gonad with a volume of respectively 2 ml and 0.9 ml) with a scrotal position about 1 cm high. Radiological and endoscopic explorations were carried out, revealing the persistence of the vaginal cavity of uro-genital sinus type, implanted in the posterior urethra below the sphincter. Hormonal examination was performed, revealing high FSH (33UI/L; normal range: 1.4-10 IU/L) and LH (56 UI/l; normal range: 1.4-8 IU/L) values, and low serum testosterone concentrations (1.4 ng/ml; normal range: 2.50-9.50 ng/dL). The implantation of the urogenital sinus on the posterior side of the urethra has been shown to cause episodes of urinary incontinence, recurrent urinary infections, dysuria and burning urination by infections. In addition, the patient had sexual problem with retrograde ejaculation. The ablation of this müllerian residue was therefore carried out at the age of 33 years old. Unfortunately, despite our request, the parents were not available.
A written informed consent was obtained from each patient for the use of its clinical data and biological samples for genetic research and publication purposes. They were specifically informed that the genetic research would concern constitutional and pathological genetics. All methods were performed in accordance with French Ethics law, and case reports do not need to go through Ethics committees according to French Law. The experimental protocol was approved by the Assistance Publique-Hôpitaux de Paris institutional committee.
PB12.CO3 (iPSCs 46,XX control, kindly provided by C. Monville) and UHOMi002-A (iPSCs 46,XY control, PUBMED: 33099111) were derived from two fertile individuals and were used for RT-PCR analysis. 46,XX iPSCs control line were used for the derivation toward naive state of pluripotency and for the differentiation of PGC-LCs following the same experimental protocol used for patient 1 and 2. We were not authorized to use 46,XY iPSCs line for PGC-LC differentiation, but we could use them for RT-qPCR analysis at primed and naive stage.
Conventional and molecular cytogenetic analysis. Standard chromosomal analyses were performed on cultured peripheral lymphocytes from both patients and on derived iPSC clones, by standard procedures [G-banding with trypsin using Giemsa (GTG); R-banding after heat denaturation and Giemsa (RHG)].
FISH analyses were performed on metaphase spreads of lymphocytes from the patient 2. The following probes were used, in accordance with the manufacturer's recommendations: centromeric probes specific for chromosomes 18, X and Y (Vysis) and SRY probes [The probe mix also contains control probes for the X centromere (DXZ1), and for chromosome Y (DYZ1, the heterochromatic block at Yq12)] (Cytocell). Images were created with the CytoVision 7.0 Software, Leica Biosystems.
Genomic DNA was isolated from both patient's peripheral blood and from derived iPSCs, with the Maxwell® 16 Blood DNA Purification Kit (Promega, Biotech). The concentration and quality of the extracted DNA were evaluated with a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies).
Genomic imbalances from patient 1 and 2 were analyzed by oligonucleotide based array-comparative genomic hybridization analysis (array-CGH). A hight resolution 1 M microarray and a 400 K custom microarray were used for each patient (Agilent Technologies, Massy, France). The 1 M microarray covered whole genome with an intragenic resolution of 1.8 kb. The custom design had a whole genome coverage with a 5 kb resolution and was strongly enriched in 363 genes with an exonic resolution of 0.1 kb and an intronic resolution of 0.6 kb. These genes of interest were selected because of their certain or potential involvement in reproductive function and sexual development in both humans and mice. Hybridization was performed according to the manufacturer's protocol. Images processing and data analysis were performed with CytoGenomics software (4.0.3.12) (Agilent Technologies). The ADM2 algorithm was used for statistical analysis. Copy number variations were considered significant if they were defined by tree or more oligonucleotides and were not identified in the Database of Genomic Variants as a polymorphism (http:// proje cts. tcag. ca/ cgi-bin/ varia tion/ gbrow se/ hg19).

Short-read whole genome sequencing (patient 1). Genome sequencing (GS) was performed with
150-bp paired-end reads using the Illumina HiseqX Ten platform. Reads were aligned to the human genome reference (hg19) using the Burrows-Wheeler aligner. Base quality scores were recalibrated using GATK v3.8. Candidate breakpoints were predicted using Lumpy for inversions and translocations and Control-FREEC for dup/del 48 . Variants were filtered based on population frequency and their presence in public databases. Candidate breakpoints were visually inspected and selected using the Integrative Genomic Viewer (IGV) tool.
GS successfully yielded 40 × mean read depth across the whole genome. SVs were analyzed using "in house" pipeline, a local pipeline combining Lumpy and Control-FREEC. The outputs of the callers, a tabulation separated file, was annotated using public databases such as DGV, GnomAD SV, Developmental Delay, and ISCA Using GS data, Lumpy predicted 1705 structural variations breakpoints, including 958 chromosomal translocations, 276 inversion and Control-FREEC predicted 471 deletions and duplications.
Whole genome sequencing (patient 2). WGS  www.nature.com/scientificreports/ sample, in order to reach an average sequencing depth of 30 × for each sample. Sequence quality parameters have been assessed throughout the sequencing run and standard bioinformatics analysis of sequencing data was based on the Illumina pipeline to generate FASTQ file for each sample. After demultiplexing, sequences were aligned to the reference human genome hg19 using the Burrows-Wheeler Aligner 49 . Downstream processing was carried out with the Genome Analysis Toolkit (GATK), SAMtools, and Picard, following documented best practices (http:// www. broad insti tute. org/ gatk/ guide/ topic? name= best-pract ices). Variant calls were made with the GATK Haplotypecallers version 3. Structural variants were assessed using Manta 50 , Canvas 51 and WisecondorX 52 . The annotation process was based on the latest release of the Gencode database (v31) 53 , Gnomad (v2.1) 54 , Clinvar (v20190815), CADD (v1.4) 55 . Variants were annotated and filtered using the Polyweb software interface designed by the Bioinformatics platform of University Paris Descartes.
Patients-derived iPSCs generation and pluripotency characterization. Patient 2 specific iPSCs were generated by reprogramming his erythroblasts using Sendai virus-mediated gene transfer and specific iPSCs derivation was realized as previously described 24 . PCR and immunofluorescence analysis were assayed to assess endogenous expression of pluripotency marker genes in patient 2 specific iPSCs. In vivo assessment of pluripotency was performed by teratoma formation and histological analysis. Animal experiments were performed according to protocols approved by the local animal ethics advisory committee, registered at the French research ministry in accordance with French national regulation (national transposition of European directive 2010/63/CE). Mice experimentation was approved by the Commissariat à l'Energie Atomique et aux Energies Alternatives, 92265 at Fontenay aux Roses, France.
Immunohistochemical staining for TFAP2C and D2-40. Four to five sections from each EBs clone were mounted on slides. The paraffin was removed, and sections were rehydrated in several baths of toluene, ethanol 100°, ethanol 96°, H 2 O and finally stained with hematoxylin-eosin. Endogenous peroxidase activity and non-specific protein binding were blocked by incubating the sections in H 2 O 2 followed by normal horse serum 2.5%. The sections were then incubated with TFAP2C or D2-40 mouse primary antibodies for 1 h at 37 °C. Primary antibody was detected by incubation with anti-mouse secondary antibody. Finally, peroxidase activity was detected with DAB. Sections were rinsed in PBS between each step. Analyses and quantification were performed with Histolab version 11.5.1 Software (Microvision Instruments, https:// www. micro vision. fr) analysis software. Immunofluorescent staining and imaging analysis. For immunofluorescence staining (5hmC, D2-40), paraffin sections of EBs were submitted to antigen retrieval with tris-EDTA (121 °C, 20 min) and then blocked in 0,2% gelatine, 0.05% Tween 20, 0.2% BSA and PBS 1X for one hour before adding antibodies. The primary antibodies used in this study were as follows: monoclonal rabbit anti-5hmC (Active Motif ref #39769; 1:400) and monoclonal mouse anti-D2-40 (Dako, ref M3619, 1:200). Specific rabbit and mouse secondary antibodies were conjugated respectively with Alexa Fluor 594 and 488 (1:500). Slides were mounted in Vectashield medium. Image acquisition was accomplished with a laser-scanning confocal microscope (confocal spinning disk, W1)) and images were analysed using MetaMorph device (Molecular Devices, https:// fr. molec ulard evices. com) and ImageJ 1.52 (National Institutes of Health, https:// imagej. nih. gov) software. www.nature.com/scientificreports/ RNA extraction and reverse transcription reaction. Total RNA was extracted using RNeasy mini kit (Qiagen, Valencia, CA, USA) according to the manufacturer's instructions. Reverse transcription was carried out using High capacity kit (Applied biosciences, Foster City, CA, USA) following the instructions provided by the manufacturer. Maximal amount of 1 µg of RNA was used in RT reactions. cDNA synthesis in 20 µL of total volume was as follows: 2 µl of 10 × RT random primers, 2 µl of 10 × RT buffer, 1 µl of RNAse inhibitor, 0.8 µl of 25 × dNTP Mix (4 mM) and 1µL of 50 U/µl MultiScribe Reverse Transcriptase completed by nuclease free H2O up to 20 µl. After gently mixing tube contents and incubating at 25 °C for 10 min, the cDNA synthesis was performed at 37 °C for 2 h followed by 85 °C for 5 min to inactivate the reverse-transcriptase. Diluted or undiluted cDNA was used in qPCR immediately, or stored at − 20 °C until use.